1. Loading the Data

2. Exploratory Data Analysis

a. Comparison among the data sets

Monthly data

The following features can be seen in the figure below.
  • Seasonality of the one-year period can be observed in some drugs (especially in N02BE, R03, and R06).
  • Each drug has different cycles and there are no uniform seasonality.
  • There is a significant drop in sales in January 2017 and October 2019.
# time plot
df_drug_monthly |> 
  autoplot(Sales) + 
  labs(title = "Monthly sales of drugs") +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2) +
  theme(legend.position = "none")

# Seasonal subseries plots
df_drug_monthly |> 
  gg_subseries(Sales) + 
  labs(title = "Monthly sales of drugs") +
  theme(legend.position = "none")

df_drug_monthly |> 
  ggplot(aes(x = as.factor(month(Month)), y = Sales)) +
  geom_boxplot() +
  facet_wrap(~ Drug, scales = "free_y", ncol = 2)

df_drug_monthly |> 
  gg_season(Sales)

Weekly data

The following features can be seen in the figure below.
  • Seasonality of the one-year period can also be observed in some drugs (especially in N02BE, R03, and R06) as we can see from the monthly data.
  • There is no significant drop in sales in January 2017 and October 2019, which means there may be some problems with aggregation.
# time plot
df_drug_weekly |> 
  autoplot(Sales) + 
  labs(title = "weekly sales of drugs") +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2) +
  theme(legend.position = "none")

# Seasonal subseries plots
df_drug_weekly |> 
  gg_subseries(Sales) + 
  labs(title = "weekly sales of drugs") +
  theme(legend.position = "none")

df_drug_weekly |> 
  ggplot(aes(x = as.factor(week(Week)), y = Sales)) +
  geom_boxplot() +
  facet_wrap(~ Drug, scales = "free_y", ncol = 1)

df_drug_weekly |> 
  gg_season(Sales)

Daily data

The following features can be seen in the figure below.
  • We cannot observe any weekly seasonality from the daily data.
# time plot
df_drug_daily |> 
  autoplot(Sales) + 
  labs(title = "daily sales of drugs") +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2) +
  theme(legend.position = "none")

# Seasonal subseries plots
df_drug_daily |> 
  gg_subseries(Sales, period = "week") + 
  labs(title = "daily sales of drugs") +
  theme(legend.position = "none")

df_drug_daily |> 
  ggplot(aes(x = weekdays(Date), y = Sales)) +
  geom_boxplot() +
  facet_wrap(~ Drug, scales = "free_y", ncol = 2)

df_drug_daily |> 
  gg_season(Sales, period = "week")

df_drug_daily |> 
  gg_season(Sales, period = "month")

df_drug_daily |> 
  gg_season(Sales, period = "year")

Hourly data

The following features can be seen in the figure below.
  • We can observe daily seasonality from the data however it can be estimated that it just comes from the business hours.
# time plot
df_drug_hourly |> 
  autoplot(Sales) + 
  labs(title = "hourly sales of drugs") +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 1) +
  theme(legend.position = "none")

# Seasonal subseries plots
df_drug_hourly |> 
  gg_subseries(Sales, period = "day") + 
  labs(title = "daily sales of drugs") +
  theme(legend.position = "none")

df_drug_hourly |> 
  ggplot(aes(x = as.factor(hour(Datetime)), y = Sales)) +
  geom_boxplot() +
  facet_wrap(~ Drug, scales = "free_y", ncol = 1)

df_drug_hourly |> 
  gg_season(Sales, period = "day")

From the above, it can be estimated that this data has a one-year seasonality, and therefore, weekly or monthly data are candidates for data that can be utilized in the forecast. Since there are some missing data of unknown reason in the monthly data, it is appropriate to use the weekly data in this analysis.

b. Moving average smoothing and Decomposition

In the following, weekly data is used to perform Moving average smoothing and STL Decomposition.

## Moving average smoothing

# add the 52-MA and 52x2-MA to the data set
df_drug_weekly <- df_drug_weekly |> 
  group_by(Drug) |> 
  mutate(
    MA_52 = slider::slide_dbl(Sales, mean, 
                              .before = 25, .after = 26, .complete = TRUE),
    MA_52x2 = slider::slide_dbl(MA_52, mean, 
                              .before = 1, .after = 0, .complete = TRUE)
    ) |> 
  ungroup()

# plot 52x2-MA
df_drug_weekly |> 
  autoplot(Sales) +
  geom_line(aes(y = MA_52x2, color = "52-week Moving Average"), size =1) +
  labs(title = "Weekly Sales of Drugs with 52-week Moving Average") +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2) +
  theme(legend.position = "bottom")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 52 rows containing missing values or values outside the scale range
## (`geom_line()`).

Add some implications from the moving average smoothing.

## Decomposition

# function for Classical additive decomposition
CL_dcmp <- function(drug_code) {
  df_drug_weekly |>
    filter(Drug == drug_code) |>
    model(
      classical_decomposition(Sales, type = "additive")
    ) |>
    components() |>
    autoplot() +
    ggtitle(paste("Classical additive decomposition for Drug:", drug_code))
}

# apply Classical additive to all unique drug code
lapply(unique(df_drug_weekly$Drug), CL_dcmp)
## [[1]]
## Warning: Removed 26 rows containing missing values or values outside the scale range
## (`geom_line()`).

## 
## [[2]]
## Warning: Removed 26 rows containing missing values or values outside the scale range
## (`geom_line()`).

## 
## [[3]]
## Warning: Removed 26 rows containing missing values or values outside the scale range
## (`geom_line()`).

## 
## [[4]]
## Warning: Removed 26 rows containing missing values or values outside the scale range
## (`geom_line()`).

## 
## [[5]]
## Warning: Removed 26 rows containing missing values or values outside the scale range
## (`geom_line()`).

## 
## [[6]]
## Warning: Removed 26 rows containing missing values or values outside the scale range
## (`geom_line()`).

## 
## [[7]]
## Warning: Removed 26 rows containing missing values or values outside the scale range
## (`geom_line()`).

## 
## [[8]]
## Warning: Removed 26 rows containing missing values or values outside the scale range
## (`geom_line()`).

# function for STL decomposition
STL_dcmp <- function(drug_code) {
  df_drug_weekly |>
    filter(Drug == drug_code) |>
    model(
      STL(Sales ~ trend(window = 7) +
            season(window = "periodic"),
          robust = TRUE)
    ) |>
    components() |>
    autoplot() +
    ggtitle(paste("STL Decomposition for Drug:", drug_code))
}

# apply STL decomposition to all unique drug code
lapply(unique(df_drug_weekly$Drug), STL_dcmp)
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

Add some implications from the decomposition.

c. Stationary Analysis

In the following, weekly data is used.

## ACF and PACF
# ACF plot
df_drug_weekly |> 
  ACF(Sales) |> 
  autoplot() +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2) 

# PACF plot
df_drug_weekly |> 
  PACF(Sales) |> 
  autoplot() +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2) 

Add some implications .

d, ADF test

# function for ADF test
ADF_test <- function(drug_code) {
  print(drug_code)
  df_drug_weekly |>
    filter(Drug == drug_code) |>
    pull(Sales) |> 
    tseries::adf.test() |> 
    print()
}

# apply ADF test to all unique drug code
for (drug_code in unique(df_drug_weekly$Drug)) {
  ADF_test(drug_code)
}
## [1] "M01AB"
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_weekly, Drug == drug_code), Sales)
## Dickey-Fuller = -3.6996, Lag order = 6, p-value = 0.02442
## alternative hypothesis: stationary
## 
## [1] "M01AE"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_weekly, Drug == drug_code), Sales)
## Dickey-Fuller = -5.4432, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
## 
## [1] "N02BA"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_weekly, Drug == drug_code), Sales)
## Dickey-Fuller = -4.6998, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
## 
## [1] "N02BE"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_weekly, Drug == drug_code), Sales)
## Dickey-Fuller = -3.9667, Lag order = 6, p-value = 0.01106
## alternative hypothesis: stationary
## 
## [1] "N05B"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_weekly, Drug == drug_code), Sales)
## Dickey-Fuller = -3.7337, Lag order = 6, p-value = 0.02271
## alternative hypothesis: stationary
## 
## [1] "N05C"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_weekly, Drug == drug_code), Sales)
## Dickey-Fuller = -6.1986, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
## 
## [1] "R03"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_weekly, Drug == drug_code), Sales)
## Dickey-Fuller = -3.4184, Lag order = 6, p-value = 0.05161
## alternative hypothesis: stationary
## 
## [1] "R06"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_weekly, Drug == drug_code), Sales)
## Dickey-Fuller = -3.772, Lag order = 6, p-value = 0.0208
## alternative hypothesis: stationary

Add some implications .

3. Forecasting